F0 estimation in noisy speech based on long-term harmonic feature analysis combined with neural network classification

نویسندگان

Dongmei Wang

Philipos C. Loizou

John H. L. Hansen

چکیده

In this study, we propose a frequency domain F0 estimation approach based on long term Harmonic Feature Analysis combined with artificial neural network (ANN) classification. Long term spectrum analysis is proposed to gain better harmonic resolution, which reduces the spectrum interference between speech and noise. Next pitch candidates are extracted for each frame from the long term spectrum. Five specific features related to harmonic structure are computed for each candidate and combined together as a feature vector to indicate the status of each candidate. An ANN is trained to model the relation between the harmonic features and the true pitch values. In the test phase, target pitch is selected from the candidates according to the maximum output score from the ANN. Finally, post-processing is applied based on average segmental output to eliminate inconsistent or fluctuating decision errors. Experimental results show that the proposed algorithm outperforms several state-of-the-art methods for F0 estimation under adverse conditions, including white noise and multi-speaker babble noise.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Noisy speech enhancement based on long term harmonic model to improve speech intelligibility for hearing impaired listeners

This study proposes a speech enhancement algorithm to improve speech intelligibility for hearing impaired listeners in adverse conditions. The proposed algorithm is based on a long term harmonic model, where the harmonics of target speech are more distinguished from noise spectrum interference. Our method consists of two stages: i) Prominent pitch estimation based on long term harmonic feature ...

متن کامل

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

A Convolutional Neural Network based on Adaptive Pooling for Classification of Noisy Images

Convolutional neural network is one of the effective methods for classifying images that performs learning using convolutional, pooling and fully-connected layers. All kinds of noise disrupt the operation of this network. Noise images reduce classification accuracy and increase convolutional neural network training time. Noise is an unwanted signal that destroys the original signal. Noise chang...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

F0 estimation in noisy speech based on long-term harmonic feature analysis combined with neural network classification

نویسندگان

چکیده

منابع مشابه

Noisy speech enhancement based on long term harmonic model to improve speech intelligibility for hearing impaired listeners

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech Emotion Recognition Using Scalogram Based Deep Structure

A Convolutional Neural Network based on Adaptive Pooling for Classification of Noisy Images

عنوان ژورنال:

اشتراک گذاری